JMIR Public Health and Surveillance — Latest Matching Preprints

1

Positive Registration Rate as a Key Determinant of COCOA Effectiveness: Empirical Evidence from Individual-Level Key-Match Data during the Sixth and Seventh COVID-19 Waves in Japan

Nakagawa, S.; Kumagai, S.; Yamamoto, A.

2026-05-08 health informatics 10.64898/2026.05.06.26352506 medRxiv

Top 0.1%

18.5%

Show abstract

BackgroundCOCOA, Japans Bluetooth-based COVID-19 contact tracing app, was widely regarded as ineffective due to persistently low key-match counts. However, this assessment may have conflated two distinct phenomena: (1) a structurally suppressed positive registration rate caused by administrative friction in the HER-SYS linkage, and (2) genuine epidemiological inefficacy. ObjectiveTo empirically examine whether the correlation between individual COCOA key-match counts and regional COVID-19 case numbers depended on positive registration rate, using a unique longitudinal dataset from a single observer with a rigorously controlled behavioral pattern. MethodsThe corresponding author (S.N.) recorded daily key-match counts from his personal iPhone from January 10 to October 8, 2022, encompassing the Sixth Wave (January 10-April 20, 2022) and Seventh Wave (July 9-September 2, 2022). Daily reported COVID-19 cases in Tokyo were obtained from publicly available NHK data. Pearson correlation coefficients were calculated for each wave period separately. ResultsDuring the Sixth Wave, no meaningful correlation was observed between key-match counts and daily case numbers (r2 = 0.018, p = 0.059, n = 194). In contrast, during the Seventh Wave, a strong positive correlation emerged (r2 = 0.530, p < 0.001, n = 56). This correlation disappeared abruptly after September 12, 2022, coinciding with Japans revision of the mandatory full case reporting (Zenshu Todokedashi) policy, which substantially reduced positive registrations in COCOA. ConclusionsCOCOAs utility as an individual infection risk indicator was critically dependent on positive registration rate rather than app installation rate. These findings provide the first real-world empirical evidence supporting the threshold effect predicted by prior simulation studies, and offer important lessons for the design of digital tools in future pandemic preparedness.

2

Comfort with AI for HIV Prevention Among Cisgender Women in New York City

Reyes Nieva, H.; Flanagan, M.; Huang, S.; Theodore, D. A.; Nkodo, A. F.; Parkinson, M.; Hill, S.; McAndrew, M.; Benitez, J. A.; Peralta, H.; Amesty, S.; Zucker, J. E.; Sobieszczyk, M.; Castor, D.

2026-06-03 health informatics 10.64898/2026.06.02.26354471 medRxiv

Top 0.1%

18.3%

Show abstract

Background: Long-acting pre-exposure prophylaxis (PrEP) expands HIV prevention options for women. However, PrEP impact depends on addressing persistent gaps in awareness, access, and use. Artificial intelligence (AI) tools, including conversational agents, are being explored to advance PrEP uptake, but comfort with AI may influence their impact. Thus, we examined women's comfort with AI and its association with PrEP awareness. Methods: We analyzed self-reported data from women aged [≥]18 years in a cross-sectional survey conducted in New York City from August 2023 to August 2024. We performed descriptive analyses, applied latent class analysis to identify AI knowledge/comfort profiles, and estimated unadjusted and adjusted odds ratios to assess associations between profile membership and PrEP awareness. Results: Among 306 respondents without a diagnosis of HIV who completed AI-related survey items, the median age was 36. Most women identified as Hispanic/Latina (60%) or Non-Hispanic Black (18%), had not completed college (53%), and spoke only English or were bilingual (81%). Latent class analysis identified four AI knowledge/comfort profiles that differed by PrEP awareness, race/ethnicity, borough, prior drug use, and technology utilization. Women with varied AI knowledge, broad AI discomfort, and comfort with clinicians maintaining privacy had lower odds of PrEP awareness (OR: 0.35, 95% CI: 0.16-0.75), but this association did not persist after statistical adjustment. Conclusions: PrEP awareness and AI knowledge were limited, yet many women expressed openness to AI-enabled tools when privacy was assured. AI-enabled HIV prevention tools should prioritize trust, transparency, confidentiality, and the lived contexts of the women they intend to serve.

3

Perceptions of HPV Self-Collection for Cervical Cancer Screening Among Mobile Health Program Attendees

Tovar, A.; Person-Rennell, N.; Coronado, G.; Madhivanan, P.; Soto, S.; Escheman, H.; Morenz, A. M.

2026-05-04 primary care research 10.64898/2026.05.01.26352235 medRxiv

Top 0.1%

10.3%

Show abstract

BackgroundMobile health programs (MHPs) provide essential preventive services to uninsured and underserved communities. Following the 2024 regulatory approval of human papillomavirus (HPV) self-collection for cervical cancer screening, MHPs represent an access point for healthcare-based self-collection. However, little is known about patient perceptions of this approach in MHP and other healthcare settings. MethodsFrom May - August 2025, we surveyed individuals aged 25-65 years with a cervix who attended MHPs in Southern Arizona. The survey assessed interest in HPV self-collection, preferred locations, instructional preferences, and facilitators to attend follow-up after a positive result. Descriptive statistics summarized demographic characteristics and survey responses. ResultsFifteen female participants completed the survey (mean age 36 years). Ten (67%) identified as Hispanic or Latino, nine (60%) preferred Spanish, and 14 (93%) were uninsured. Interest in HPV self-collection was high, with ten (67%) very or extremely interested. Among those interested, nine (69%) preferred home-based self-collection, and four (31%) preferred clinic or MHP-based self-collection. Most common concerns regarding self-collection on the MHP were ensuring privacy (n=7; 47%) and knowing how to perform the test correctly (n=5; 33%). Most participants (n=11; 73%) reported being very or extremely confident they would attend follow-up after a positive result; language-concordant support, reminder calls, and scheduling assistance were the most endorsed facilitators. ConclusionHPV self-collection was highly acceptable among MHP attendees, although home-based self-collection was most preferred. Addressing privacy concerns, providing multiple modes of instruction, and offering navigation support may improve implementation success and ensure timely follow-up care in MHP settings.

4

Analytical Centralization of Health Expenditure at the National Administrator of Health System Resources: Architecture, Data Quality, and Operational Performance of the ADRES Health System Analytics Platform, Colombia

Garavito Jimenez, D. A.; Bello Angulo, D. E.; Mejia Lemus, L. T.; Chipatecua, D.; Fula, D. D.; Perez-Rubiano, S.; Martinez, F. L.; Bohorquez Pinzon, J. C.

2026-06-10 public and global health 10.64898/2026.06.08.26355159 medRxiv

Top 0.1%

8.2%

Show abstract

Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded Individual Health Services Delivery Records (RIPS -- Registro Background Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded RIPS records (FEV-RIPS) as the standard for financial and clinical data exchange. ADRES -- the entity responsible for administering the resources of Colombia's General Social Security Health System -- faced the challenge of processing information from multiple heterogeneous sources generated by more than 55,000 healthcare providers. Health systems in high-income countries converge clinical-financial data in consolidated platforms; Colombia started from a fragmented architecture with incompatible historical sources, no cross-database standardization, and no centralized analytical infrastructure until 2023. Objective We describe the design, technical challenges of integrating heterogeneous data, and operational performance of the analytical infrastructure built by ADRES to centralize large-scale processing of Colombian health system information, and derive transferable lessons for health system resource administrators in Latin America facing equivalent digitalization mandates. Methods Technical-descriptive report based on operational metrics from the ADRES Azure/Databricks environment during January-November 2025. We report indicators of data volume, processing speed, computational capacity, concurrent use by functional group, and governance structure. The architecture integrates VPN connectivity with MinSalud, automated processing of multiple formats (XML, relational tables, flat files), and a medallion data lake (Bronze/Silver/Gold). Data quality challenges include structural inconsistencies across sources, coding incompatibilities (municipalities, dates, diagnoses), format heterogeneities in unstructured data, and absent technical documentation. Results The platform manages 21 catalogs, 1,183 tables, and over 110,645 million stored records, with cumulative production exceeding 1 trillion processed records. It executes queries on 100 billion records in ten seconds using clusters of up to 32 TB RAM and 4,096 vCPU. During September-October 2025, monthly query peaks reached 78,028 across eleven functional groups. Integration required Python/PySpark parsers for variable-depth XML, equivalence tables for incompatible municipality codes, cleaning routines for extreme dates used as nulls (1900-01-01, 9999-12-31), and transformation logic bridging classic RIPS and FEV-RIPS. The platform supported econometric analyses, judicial mandate responses, and public interactive dashboards. Conversational AI integration (Genie, Copilot) extends analytical access to users without SQL knowledge. Conclusions ADRES built in one year an analytical infrastructure that provides, to our knowledge, the first published documentation of the systemic technical challenges of integrating heterogeneous data sources in a middle-income social security health system. Centralizing health system information at national scale is technically feasible under public institutional constraints -- but requires solving cross-source standardization problems the implementation literature does not document with quantitative precision. The derived lessons are transferable to health system resource administrators in Latin America facing equivalent challenges.

5

A Deep Learning-Based Predictive Algorithm for Metabolic Syndrome Detection in the U.S. Population

Correa Segade, C.; Solozabal, R.; Hammouri, Z. A. A.; Gomez-Peralta, F.; Rossman, H.; Vidal, J. C.; Klonoff, D. C.; Segal, E.; Matabuena, M.

2026-06-02 endocrinology 10.64898/2026.05.24.26354007 medRxiv

Top 0.1%

8.2%

Show abstract

Objective To develop clinically operational, population-representative risk-score models for detecting metabolic syndrome (MetS) in U.S. adults by incorporating the NHANES survey design. Research Design and Methods We analyzed 36,812 U.S. adults from NHANES 1988--2018. Seven models of increasing clinical complexity were trained and evaluated, ranging from basic demographics to full biochemical panels. We used a new deep-learning methodology for survey data with a predictive uncertainty quantification model. Results A model combining anthropometrics, vital signs and a basic lipid panel achieved an AUC of 0.923 at an estimated cost of 0.40 eur per individual. Adding diabetes-specific biomarkers, including fasting plasma glucose (FPG) and glycated hemoglobin (HbA1c), yielded only marginal improvements. Conclusions This low-cost population-representative screening tool for MetS may help identify at-risk individuals and support data-driven public health interventions.

6

Estimating the mpox vaccine uptake among MSM and modelling the potential of future vaccination campaigns in the EU/EEA

Prasse, B.; Hansson, D.; Aphami, L.; Jonas, K. J.; Borrel Pique, J.; Andrianou, X.; Pharris, A.; Plachouras, D.; Schmidt, A. J.; Nerlander, L.

2026-04-18 public and global health 10.64898/2026.04.16.26350851 medRxiv

Top 0.1%

7.2%

Show abstract

In October 2025, mpox virus clade I infections have been detected among men who have sex with men (MSM) in the EU/EEA, suggesting local transmission in MSM sexual networks. Given the large outbreak of mpox among MSM in 2022 and the uncertain transmission parameters of clade I in the European context, clade I poses a public health concern to the EU/EEA. This work assesses the potential effect of increasing the mpox vaccine uptake among MSM via two contributions. First, building on the European MSM and Trans Persons Internet Survey 2024, we estimate the mpox vaccine uptake among MSM as well as the proportion who are unvaccinated but willing to get vaccinated for 28 countries in the EU/EEA. Specifically, we fit Bayesian mixed-effects models for the vaccine and recovery status of an individual depending on their number of sexual partners and country. Second, we develop a susceptible-infectious-recovered model on a sexual contact network to estimate the reduction of the reproduction number if vaccines are provided to MSM who are willing to get vaccinated. Our results suggest a substantial willingness for mpox vaccination among MSM if mpox cases increase and a large reduction of the effective reproduction number if this willingness is met. These findings highlight a large potential of increasing mpox vaccine uptake among MSM and preventing future mpox outbreaks in the EU/EEA.

7

A unified modeling platform for informing cervical cancer prevention policy decisions in 132 low- and middle-income countries

Man, I.; Macacu, A.; Eynard, M.; Adhikari, I.; Gini, A.; Georges, D.; Baussano, I.

2026-03-20 public and global health 10.64898/2026.03.18.26348700 medRxiv

Top 0.1%

7.1%

Show abstract

Background: Public health decision modelling tools designed to inform cervical cancer prevention policies in low- and middle-income countries (LMICs) are useful but scarce. Important challenges herein are the often missing or inconsistently collected cervical cancer epidemiological data, and the lack of a systematic approach to deal with such data limitations. Methodology/Principal Findings: We developed a unified modelling platform and workflow to enable cervical cancer modelling in 132 LMICs based on the previously developed footprinting approach, through the following steps: 1) With sexual behavior data from the Demographic Health Surveys (DHS), which were available for a large number of LMICs (70/132), we identified clusters of countries which represent distinct patterns of human papillomavirus (HPV) transmission. The 7 resulting clusters correspond to a gradient of HPV prevalence and cervical cancer risk and exhibit clear geographical separation. 2) The remaining LMICs were classified into the identified clusters based on geographical proximity so that each LMIC was grouped to a cluster. Goodness of classification was validated using available epidemiological data. 3) We then calibrated the HPV transmission and cervical cancer progression models of the IARC/WHO METHIS platform to the 132 LMICS, first by cluster then by country, using the available data on sexual behavior (from DHS), HPV prevalence (from literature search), and cervical cancer incidence (from GLOBOCAN). Conclusions/Significance: A unified workflow and platform designed by IARC/WHO for public health decision modelling of cervical cancer prevention in 132 LMICs is now available. It is ready to be used to support global and local stakeholders to coordinate, design, and implement impactful and efficient prevention policies and will help to accelerate cervical cancer elimination.

8

The Influence of Polypharmacy on Type 2 Diabetes Adverse Cardiovascular Outcomes in a Rural Cohort

Li, J. W.; Crew, L. A.; Cox, T. M.; Canine, B. F.

2026-04-03 endocrinology 10.64898/2026.04.02.26350053 medRxiv

Top 0.1%

7.1%

Show abstract

Objective: In this study, we utilized a large-scale clinical database to evaluate the relationship between polypharmacy and adverse outcomes among type 2 diabetes patients in rural Montana to inform strategies that improve adherence, reduce preventable complications, and promote equitable diabetes care in underserved regions. Research Design and Methods: 591 patients from the Big Sky Care Connect Database (BSCC) with type 2 diabetes and medication history were stratified into 3 cohorts based on prescribed number of medications: (1-4 medications, non-polypharmic), (5-9 medications, polypharmic), and ([≥]10 medications, hyperpolypharmic). Each cohort was examined for Major Adverse Cardiovascular Events (MACE) and Diabetes Complication Severity Index (DCSI). Descriptive statistics, multivariate logistic regressions, linear regression, and Poisson regression analyses were performed. Results: Medication count was associated with male gender ({beta} = -2.1341, p < 0.001). Both medication count (IRR 1.06 per additional medication, p < 0.001) and age (IRR 1.03 per year, p < 0.001) were significant predictors of MACE. Neuropathy and nephropathy prevalence was statistically significant (p < 0.001) across patient cohorts and increased with medication count.

9

Stigmatizing Language Detection in Opioid Use Disorder Patient-Directed Discharge Clinical Documentation: A Privacy-Preserving Analysis Using a Locally Deployed Large Language Model

Izzo, J. A.; McIntyre, A. M.; Nguyen, J.; Bashaw, D.; Torrance, C. A.; Foster, J.

2026-06-01 health informatics 10.64898/2026.05.29.26354402 medRxiv

Top 0.1%

7.1%

Show abstract

Objective: Stigmatizing language in the electronic health record (EHR) has been associated with adverse patient experience in substance use disorder care, including opioid use disorder (OUD). This study evaluated a privacy-preserving, locally-deployed large language model as a method to detect stigmatizing language documentation in OUD patients with patient-directed discharge (PDD). Methods: A retrospective cohort study of 477 inpatient admissions from the MIMIC-IV database with a diagnosis of opioid use disorder were classified using a locally deployed Gemma-4-31b-it-bf16 model and predefined 140 term lexicon to identify stigmatizing language in clinical documentation. Results: Analysis of clinical documentation showed stigmatizing language was present in 84.1% (190/226) in the PDD cohort vs 62.2% (156/251) in the non-PDD cohort, with an unadjusted odds ratio of 3.21 (95% CI 2.07-4.98; p < 0.0001). After adjustment for age, sex, insurance status, marital status, and race, PDD discharge remained an independent predictor of stigmatizing documentation (aOR 2.24, 95% CI 1.40-3.59; p < 0.0001). Further analysis of stigma intensity showed higher stigmatizing markers in the PDD cohort vs the non-PDD cohort (2.85 {+/-} 2.39 vs 2.02 {+/-} 2.44; p < 0.0001). Discussion and Conclusion: Stigmatizing language is detected with increased frequency and prevalence in clinical documentation of OUD patients that initiate PDD compared to those that adhere to standard discharge processes. A locally deployed large language model (LLM) offers a scalable, privacy-preserving method to audit clinical documentation for stigmatizing language.

10

Variation in Telehealth Use in a National Home Test-to-Treat Program for Acute Respiratory Infections

Losos, W.; Wang, B.; Fisher, K.; O'Connor, L.; Soni, A.; Gerber, B.

2026-05-26 health informatics 10.64898/2026.05.24.26353984 medRxiv

Top 0.1%

7.0%

Show abstract

Background Home Test-to-Treat (HTTT) programs deliver timely antiviral treatment for acute respiratory infections, including COVID-19 and influenza, through at-home testing and telehealth. Because access is often measured by visit occurrence, variation in how and when care is delivered may be overlooked. We hypothesized that telehealth access follows distinct process-based patterns. Methods We analyzed de-identified encounters from the national HTTT program (September 2023-July 2024); 6,213 of 8,160 eligible individuals remained after exclusions for missing data. Phenotypes were derived by k-means clustering of standardized variables capturing encounter timing, modality preference, process duration, and sociodemographic and digital access attributes. Ten-day surveys assessed symptom duration and healthcare utilization. Results Three phenotypes emerged: Delayed/Disrupted Access (n = 1,537; 24.7%), Digitally Engaged but Socioeconomically Vulnerable (n = 1,460; 23.5%), and Mainstream Access and Efficient Utilization (n = 3,216; 51.8%). Mean process duration differed (15.93 [SD 3.84] vs 3.69 [3.31] vs 2.87 [2.41] hours; p < 0.001). Synchronous preference was lowest in the Digitally Engaged group (22.9%); antiviral prescribing was high (88.6%-91.9%). Among 10-day respondents (n = 1,023), symptom duration did not differ. Emergency department visits were most frequent in the Digitally Engaged group (2.3% vs 0.0% and 0.5%; p = 0.02) and urgent care in the Delayed/Disrupted group (5.8% vs 4.1% vs 2.0%; p = 0.02). Conclusions Telehealth use in a national HTTT program formed distinct phenotypes defined by timing, modality, and care-process efficiency. Evaluating equity requires attention to how and when care is delivered, not simply whether it occurred.

11

Persistent Proxy Discrimination in HIV Testing Prediction Models: A National Fairness Audit of 386,775 US Adults

Farquhar, H.

2026-03-16 health informatics 10.64898/2026.01.27.26344936 medRxiv

Top 0.1%

7.0%

Show abstract

BackgroundIn clinical contexts where disease burden differs across demographic groups, enforcing demographic parity -- equal prediction rates regardless of group -- may reduce screening for the populations that need it most. We demonstrate this using HIV testing prediction as a case study. MethodsUsing the Behavioral Risk Factor Surveillance System (BRFSS) 2024 dataset (N=386,775), we trained four classifiers to predict HIV testing uptake and evaluated disparities using demographic parity difference (DPD), equalized odds difference (EOD), and calibration across eight racial/ethnic groups. We applied threshold optimization and exponentiated gradient mitigation and quantified their impact on high-burden populations, including intersectional effects across race and sex. ResultsBaseline selection rates ranged from 12.1% (Asian) to 66.0% (Black), mirroring differential HIV burden (DPD 0.519-0.634). Race-blind models retained 70% of baseline disparity through correlated social determinants. Enforcing demographic parity reduced Black true positive rates from 78.2% to 30.0% (61.6% relative decrease), causing 1,610 additional missed individuals. Race-only optimization worsened sex-based disparity by 71%; multi-objective optimization reduced intersectional DPD from 0.609 to 0.076 but at the same cost to high-burden groups. Exponentiated gradient AUC fell from 0.671 to 0.592 (11.8% relative decrease). Survey-weighted sensitivity analysis confirmed unweighted estimates underestimated disparities. ConclusionsDemographic parity is an inappropriate fairness criterion in differential-burden clinical contexts because it reduces screening access for high-risk populations. Fairness audits in healthcare should use need-appropriate metrics (equalized odds, calibration) rather than defaulting to demographic parity, and metric selection should involve clinician and community stakeholder deliberation.

12

Enhancing dengue diagnosis and surveillance by integrating machine learning technologies with the NS1 rapid test kit

Hwang, C.-K.; Chen, Y.-W.; WANG, Y.-T.; Ho, T.-S.; Oyang, Y.-J.

2026-05-06 health informatics 10.64898/2026.05.05.26352445 medRxiv

Top 0.1%

6.9%

Show abstract

BackgroundDengue has been a major health threat globally in recent years. In particular, dengue incidences continue to increase annually and the epidemic area has expanded primarily due to global warming. Therefore, effective case detection and surveillance strategies are crucial to tackle this global health challenge. In clinical practice, the rapid test kit detecting dengue non-structural protein 1 antigen and commonly referred as NS1, is widely employed for early diagnosis. However, real-world studies revealed that the sensitivity of the NS1 test kit ranged from approximately 61% to 95%. Since early diagnosis is really critical for disease surveillance in the early stage of a dengue epidemic, scientists have been working hard to develop novel diagnosis methods that can provide higher sensitivity levels. Methodology/Principal FindingsIn response to this challenge, in this study, we have developed a novel diagnosis procedure that integrates machine learning technologies with the NS1 test kit. Our experimental results revealed that we would be able to raise the sensitivity of the dengue diagnosis procedure to higher than 99% by incorporating machine learning based prediction models to screen the suspected patients with a negative NS1 result. Furthermore, the relative risks between the suspected patients who were predicted to be positive and those who were predicted to be negative exceeded 4.8. Conclusions/SignificanceThese results illustrate that the proposed approach provides an effective and efficient diagnosis procedure to address the global health challenge caused by spread of dengue. Author SummaryThis study has aimed to enhance surveillance of the dengue disease by integrating machine learning technologies with the rapid test kit commonly employed in early diagnosis. In clinical practice, the NS1 rapid test kit is widely employed for early diagnosis. However, real-world studies revealed that a certain percentage of the patients with a negative NS1 test result, ranging from 5% to 39%, were actually infected by dengue. Since early diagnosis is critical for disease control in the early stage of a dengue epidemic, scientists have been working hard to tackle this challenge. Based on this observation, this study was launched to investigate the effects of incorporating machine learning based prediction models to further screen those patients with a negative NS1 test result. The experimental results revealed that the proposed approach was able to identify over 99% of the patients who were infected by the dengue disease. Furthermore, the risk of the suspected patients who were predicted to be positive was 4.8 times higher than the risk of those who were predicted to be negative. The experimental results illustrate that the proposed approach provides an effective and efficient diagnosis procedure to enhance surveillance of the dengue disease.

13

Comparative Evaluation of Logistic Regression and Gradient Boosting Models for Influenza Outbreak Early-Warning Using U.S. CDC ILINet Surveillance Data (2010-2025)

Onwuameze, C. N.; Madu, V.

2026-03-13 health informatics 10.64898/2026.03.05.26347655 medRxiv

Top 0.1%

6.5%

Show abstract

BackgroundTimely detection of seasonal influenza outbreaks is critical for healthcare system preparedness and public health response. Although numerous studies have examined short-term influenza forecasting, fewer have operationalized prediction as a binary early-warning problem linked to actionable surveillance thresholds. This study evaluated the performance of traditional and machine learning models for detecting national influenza outbreak weeks using U.S. Centers for Disease Control and Prevention (CDC) ILINet surveillance data. MethodsWeekly national ILINet data from 2010-2025 were analyzed. Outbreak weeks were defined as those in which weighted influenza-like illness (ILIPERCENT) exceeded the 90th percentile of the 2010-2017 training distribution (threshold = 3.3932%). Predictors included three-week lags of ILIPERCENT and percent positive laboratory specimens, along with seasonal harmonic terms. Models were trained on 2010-2017 data and evaluated on a temporally held-out 2020-2025 test period. Performance metrics included area under the receiver operating characteristic curve (AUC), precision-recall area under the curve (PR-AUC), sensitivity, specificity, precision, and F1-score. FindingsOn the 2020-2025 test set, logistic regression achieved an AUC of 0.9964 and PR-AUC of 0.9868, with sensitivity of 1.0000 and specificity of 0.9516. XGBoost achieved an AUC of 0.9946 and PR-AUC of 0.9812, with sensitivity of 0.8939 and specificity of 0.9798. Both models demonstrated near-perfect discrimination between outbreak and non-outbreak weeks under strict temporal validation. InterpretationNational influenza outbreak early-warning can be implemented using publicly available CDC surveillance data with high discriminatory accuracy. Framing prediction as a threshold-based outbreak detection problem strengthens operational relevance and supports integration of predictive analytics into routine influenza surveillance and preparedness planning. Author SummarySeasonal influenza places a heavy burden on hospitals and communities each year, yet public health officials often rely on surveillance reports that describe what has already happened rather than signaling when activity is about to intensify. We examined whether routinely collected U.S. influenza surveillance data could be used to detect outbreak conditions earlier and more clearly. Using national data from the Centers for Disease Control and Prevention (CDC) covering 2010 to 2025, we compared a traditional statistical model with a machine learning approach to determine how accurately each could identify weeks when influenza activity exceeded a predefined outbreak threshold. Both approaches performed extremely well when tested on recent seasons, correctly distinguishing outbreak from non-outbreak weeks with high accuracy. Importantly, this framework translates weekly surveillance data into a practical alert signal rather than simply producing numerical forecasts. By linking model output to a clear outbreak definition, health departments and healthcare systems could use similar tools to support timely planning, communication, and resource allocation during influenza season.

14

CGM accuracy and reliability compared to point of care testing in older inpatients with comorbid type 2 diabetes and cognitive impairment

Donat-Ergin, B.; Mattishent, K.; Minihane, A. M.; Holt, R.; Murphy, H.; Dhatariya, K.; Hornberger, M.

2026-03-30 endocrinology 10.64898/2026.03.27.26349485 medRxiv

Top 0.2%

6.4%

Show abstract

Background: Older in-patients have a higher prevalence of diabetes and cognitive impairment. Cognitive impairment can make blood glucose management more challenging, since patients might not remember to measure blood glucose or report symptoms. Investigating the accuracy of continuous glucose monitoring (CGM) compared to usual care will inform clinical interpretations in this vulnerable population. Aim: To compare CGM derived glucose metrics and point-of-care tests (POCT) in older in-patients with T2DM and cognitive impairment and to investigate CGM accuracy compared to POCT in the hospital settings with the same population. Methods: Thirty-two older people with comorbid T2DM and cognitive impairment were recruited within a tertiary care hospital in the UK. All participants were naive to CGM and were asked to wear blinded Dexcom G7 sensors for up to 10 days. All participants received usual care in their hospital stay including the use of POCT. Key accuracy metrics comprised the mean absolute relative difference (MARD), median absolute relative difference (median ARD), and Clarke Error Grid (CEG), correlation (R2) analysis. In addition, the percentage of CGM readings falling within +/-20% of reference glucose values when the reference was >5.6 mmol/L, or within +/-1.1 mmol/L when the reference was <=5.6 mmol/L (+/-20%/1.1 mmol/L) was calculated to assess analytical and clinical accuracy. Results: Thirty participants completed the study. CGM derived mean glucose for time in range (TIR= 4-10 mmol/mol) was 36.23% (min= 0%, max= 90%), time above range (TAR >= 10 mmol/mol) was 62.87% and time below range (TBR <= 3.9 mmol/mol) was 1.03%. Mean TIR based on available POCT readings was 40.84%, TAR was 57.24% and TBR 1.81%, showing similar readings as CGM derived glucose metrics. Comparison of the two resulted in a MARD of 17.4%, and median ARD of 12.2% and the outcome of +/-20%/1.1 mmol/L analysis was 72.3%. CEG analysis revealed that 99.3% of the data points fell within the clinically acceptable zones (Zone A and Zone B), and there was a strong correlation (R2=0.82) between CGM and POCT. CGM captured more hypoglycaemic readings in our participants. Conclusion: Our study suggests that CGM and POCT derived glucose metrics are largely similar for in-patients with diabetes and cognitive impairment. CGM remains as a safe and clinically acceptable tool, and able to capture more nocturnal hypoglycaemia compared to POCT in a subgroup of patients. These initial findings show that CGM might be a viable alternative for people with comorbid T2DM and cognitive impairment.

15

A Prospective Observational Study on a Multimodal Non-Invasive Physiological Monitoring System (Hayl): Feasibility, Signal Characterization, and Exploratory Biomarker Correlation

Choda, G.; Choda, A.

2026-05-17 endocrinology 10.64898/2026.05.13.26353115 medRxiv

Top 0.2%

6.4%

Show abstract

Chronic conditions such as Type 2 Diabetes Mellitus (T2DM) and Hypertension (HTN) remain underdiagnosed in community settings, particularly in resource-limited populations. Conventional diagnostic approaches rely on episodic measurements and laboratory-based assessments, limiting scalability for large-scale screening. Non-invasive physiological monitoring systems offer a potential pathway for accessible and rapid wellness assessment in real-world environments. This study aimed to evaluate the feasibility, signal acquisition performance, and exploratory physiological signal characteristics of a non-invasive multimodal monitoring system (Hayl) in community-based screening settings. Methods: A prospective, cross-sectional, multicenter observational pilot study was conducted across rural and urban screening camps in south India. A total of 281 adult participants were enrolled, including individuals with known T2DM, HTN, and those without known comorbidities, encompassing both symptomatic and asymptomatic subjects. Physiological data were acquired using the Hayl system, which integrates photoplethysmography (PPG) and temperature sensing. Signal acquisition feasibility, waveform quality, and derived signal characteristics were evaluated. Comparative and exploratory analyses were performed across predefined clinical subgroups. The study was conducted under Institutional Ethics Committee approval in accordance with guidelines from the Indian Council of Medical Research. Conclusion: The Hayl system demonstrated high feasibility for physiological signal acquisition, with successful PPG recordings in 274 participants (97.5%) and temperature signals in 279 participants (99.3%). Most recordings exhibited high waveform quality (74.0%), with observable variations in signal characteristics across clinically relevant subgroups. Reduced pulse variability and increased waveform irregularity were more frequently observed in participants with T2DM and HTN, while symptomatic individuals demonstrated greater signal variability compared to asymptomatic participants. Temperature measurements were stable, with a mean peripheral temperature of 33.4 with a variation of 1.2C degrees. These findings support the potential of Hayl as a non-invasive multimodal platform for community-based wellness screening and exploratory signal-based physiological assessment. Further large-scale and longitudinal studies are required to establish clinical utility.

16

Protocol for the REVELIO test-track pilot study: a randomised, controlled, single-centre trial in healthy recreational cannabis users investigating real-time in-vehicle detection of cannabis-impaired driving

Bechny, M.; Deuber, R.; Heck, C.; Brügger, J.; Pfäffli, M.; Jovanova, M.; Fleisch, E.; Wortmann, F.; Weinmann, W.

2026-05-01 health informatics 10.64898/2026.04.29.26352110 medRxiv

Top 0.2%

6.4%

Show abstract

BackgroundDriving under the influence of cannabis is associated with impaired cognitive and psychomotor performance and an increased risk of traffic accidents. Reliable real-time in-vehicle systems for detecting cannabis-related driving impairment are currently lacking; but hold great potential for improving road safety. MethodsThis protocol describes the REVELIO test-track pilot study: a randomised, controlled, open-label, interventional, single-centre trial. The study assesses the feasibility and methodological requirements for developing and evaluating a multimodal in-vehicle detection approach using vehicle and driver state data. A total of 45 healthy recreational cannabis users will be enrolled and randomly allocated to an intervention or a reference control group. During the main study day, all participants will undergo biological sampling for tetrahydrocannabinol (THC) and related metabolites, as well as pre-driving assessments, followed by a sober baseline driving session on a closed test track using a dual-pedal vehicle with a certified driving instructor onboard. Participants in the intervention group will then receive a single controlled inhalative cannabis dose (target 0.67 mg THC per kg body weight), while the reference group will receive no cannabis. All participants will subsequently complete three additional standardized 50-minute driving sessions at predefined time points up to approximately six hours after administration, following identical schedules to enable within- and between-group comparisons. Between driving sessions, structured breaks will include recovery periods, repeated biological sampling, and traffic-medical, traffic-psychological, and pre-driving performance assessments, to characterise the temporal dynamics of cannabis-related impairment. DiscussionMultimodal data will be collected, including vehicle controller area network (CAN) data, driver monitoring camera (DMC) data, physiological signals using wearables, and biological samples (capillary blood, breath, oral fluid. Machine-learning-based models will be developed and evaluated to distinguish sober from cannabis-influenced driving states under controlled conditions. Secondary analyses will examine changes in driving performance over time and associations between functional measures and biological THC concentrations. As an exploratory pilot study conducted on a secured test track, the protocol aims to generate standardized reference data and quantitative performance metrics to inform both feasibility and system design considerations. Ethics and trial registrationThe study was approved by the Cantonal Ethics Committee Bern, Switzerland (BASEC ID: 2025-01590) and is registered at ClinicalTrials.gov (NCT07401628).

17

Impact of prescription-free access to sexually transmitted infection screening tests in medical-biological laboratories: cross-sectional analysis of data from clinical laboratories in France.

Gil-Salcedo, A.; Gazzano, V.; Arsene, S.; Durand, A.; Roger, S.; Prots, L.; Laurencin, N.; Chanard, E.; Duez, A.; Le Naour, E.; Bausset, O.; Ghali, B.; Strzelecki, A.-C.; Felloni, C.; Levillain, R.; Fargeat, C.; Lefrancois, S.; Feuerstein, D.; Visseaux, B.; Escudie, L.; Visseaux, C.; Leclerc, C.; Haim-Boukobza, S.

2026-04-24 public and global health 10.64898/2026.04.23.26351562 medRxiv

Top 0.2%

6.4%

Show abstract

BackgroundSince September 2024, France has implemented a national reform allowing prescription-free access (PFA) to sexually transmitted infection (STI) screening in medical biological laboratories (MBLs). This study aims to characterize the populations undergoing STI testing according to their access modality and evaluate the probability of test positivity in relation to testing pathway, sex, and age groups. MethodsWe conducted a cross-sectional analysis of all individuals screened for Chlamydia trachomatis, Gonorrhoea, human immunodeficiency virus (HIV), hepatitis B virus (HBV), and syphilis by treponemal-specific immunoassay (TSI) in Cerballiance MBLs between Mars 2025 and February 2026. Multivariable logistic regression models stratified by sex and adjusted for age and region assessed associations between screening modality and STI positivity. ResultsAmong 1,008,737 individuals included, 27.8% were under PFA and 72.2 under prescription-based access (PBA). PFA users were more frequently male (47.4% vs. 36.3%, p<0.001) and aged 20-39 years (34.0%, p<0.001). Overall positivity rates differed by modality: PFA was associated with higher detection of Chlamydia (4.6% vs. 3.6%). PBA group showed more positive cases of syphilis (3.4% vs. 1.2%), HBV (1.3% vs. 0.4%), and HIV infections (0.3% vs. 0.2%, all p<0.001). Co-infection and gonorrhoea proportions did not significantly differ between modalities. ConclusionsPFA substantially increased STI screening uptake, particularly among young adults and men, and enhanced detection of bacterial STIs. PBA remains essential for diagnosing viral and chronic infections. These findings highlight the complementary roles of both access strategies and support PFA screening as an effective public health intervention to broaden STI detection and reduce transmission.

18

Development and validation of a dynamic risk stratification tool for predicting multidrug-resistant bacterial infections in ICU patients: A clinical prediction model and web-based calculator

Ye, L.; Lyu, B.; Yang, Q.; Mou, X.; Nawawonganun, R.; Laohasiriwong, W.

2026-05-26 intensive care and critical care medicine 10.64898/2026.05.23.26353927 medRxiv

Top 0.2%

6.3%

Show abstract

Background: Multi-drug resistant Bacterial (MDRB) Infections in the intensive care units (ICUs) substantially elevate patient mortality, prolong hospital stays, and impose heavy healthcare cost burdens. Existing predictive models for ICU-acquired MDRB infection predominantly focus on static admission-risk assessment, lacking the capacity to leverage longitudinal treatment data for dynamic risk re-stratification during the ICU stay. Meanwhile, most models suffer from poor clinical interpretability, overreliance on hard-to-collect biomarkers, or absence of deployable clinical tools, limiting real-world translation. Therefore, there is an urgent need to develop a parsimonious, interpretable tool based on routine cumulative data to guide timely intervention. This study aimed to develop a interpretable model with a web calculator to improve clinical applicability. Methods: In this study, we conducted a retrospective analysis of ICU inpatients at the First Affiliated Hospital of Dali University between January 1, 2023, and January 1, 2026. Using the create Data Partition function in R software (random seed = 42), the dataset was stratified and divided into a training group and a validation group in a 7:3 ratio. Feature selection was performed using the Boruta algorithm to validate variable rationality. A multivariable logistic regression model was constructed and visualized as a nomogram, and its performance was compared with six machine learning algorithms (Random Forest, XG Boost, Neural Network, etc.). Model validation was conducted using receiver operating characteristic curves (ROC), Decision Curve Analysis (DCA), and SHAP value interpretation. Finally, an online R Shiny calculator was developed based on the final model. Results: A total of 3,631 patients were enrolled and divided into a training group (n=2,543) and a validation group (n=1,088) using stratified random sampling. Five independent predictors were identified in the training group, which were hypertension combined with diabetes, antibiotic types, ventilator days, urinary catheter days, and PCT abnormality times. The Logistic regression model achieved an AUC of 0.772 (95%CI: 0.733-0.812) in the validation group, outperforming XG Boost (0.763) and Random Forest (0.703). The model demonstrated excellent calibration (Hosmer-Leme show {chi}{superscript 2} = 1.94, P = 0.9829) and positive net clinical benefit across threshold probabilities of 0%-40%. SHAP analysis aligned with regression-derived variable importance rankings, confirming predictor contributions. An open-access online calculator was successfully deployed (https://dongfangshao666.shinyapps.io/MDR_shiny2/), enabling real-time individualized risk stratification at the bedside. Conclusion: This study developed and validated a dynamic, interpretable multi-drug-resistant bacterial infection risk prediction model requiring only five routinely collected clinical indicators. The model balances robust predictive performance with high transparency, overcoming key limitations of prior tools. The accompanying web calculator supports dynamic risk reassessment throughout the ICU stay, facilitating precise antimicrobial stewardship, targeted infection control interventions, and optimized resource allocation, bridging the gap between statistical modeling and frontline clinical decision-making.

19

The SARS-CoV-2 Integrated Genomic Epidemiology Database (IGED): Linking viral genomes with patient-level metadata to advance statewide genomic surveillance in California

Ryder, R.; Elder, J.; Panditrao, M.; Grosgebauer, K.; Katz, R.; Tello, L.; Carroll, E.; Borthwick, D.; Kaur, C.; Smith, R.; Shiau, V.; Wheeler, W.; Reilly, E.; Myers, J.; Nelson, L.; Lim, E.; Arunleung, P.; Baylis, E.; Gilliam, S.; Hennesy-Burt, T.; Bregman, B.; Silver, E.; Kapsak, C.; Wright, S.; Leon, T.; Bell, J.; Morales, C.; Wadford, D. A.

2026-05-19 health informatics 10.64898/2026.05.14.26353263 medRxiv

Top 0.2%

6.3%

Show abstract

In July 2021, the California Code of Regulations Title 17 required all laboratories performing SARS-CoV-2 whole genome sequencing (WGS) to report their sequencing results to the California Department of Public Health (CDPH). These viral genomic data and patient metadata were compiled into the Integrated Genomic Epidemiology Database (IGED). Linking anonymized viral sequences with patient-level information enabled monitoring of infectiousness, pathogenicity, transmission dynamics, evolution, and vaccine evasion among emerging SARS-CoV-2 lineages. Laboratories performing SARS-CoV-2 WGS transmitted sequencing results to CDPH through Electronic Laboratory Reporting (ELR) and non-ELR pathways. CDPH applied uniform reporting requirements but allowed flexibility in specific data formats to accommodate diverse data systems. To preserve data quality and interoperability across heterogeneous sources, CDPH implemented standardization, validation, and deduplication protocols. Snowflake, a cloud-based data storage and analytics platform, and Posit Connect, a cloud deployment and automation platform, supported the management, processing, and integration of data within the IGED. The IGED established links between SARS-CoV-2 WGS data and epidemiologic metadata for 801,418 sequences, representing 81.7% of all sequences reported in California. Lineages reported to the IGED showed strong concordance with lineage proportions in GISAID. Sequences reported to the IGED had average turnaround times longer than one month, and the majority of sequencing was performed in Southern California and Los Angeles. The IGED enhanced genomic surveillance through predictive modeling and monitoring concerning evolutionary trends such as recombination and saltations in persistent infections. Development of the IGED highlighted the need for standardized data requirements, sustained funding for sequencing, incentives for data submission, and interdisciplinary collaboration to build an effective genomic surveillance system. This framework for linking genomic and epidemiologic data has not only generated critical insights for SARS-CoV-2 but also provided the foundation for CDPH and other public health organizations to develop similar IGED-like systems for other priority pathogens as genomic surveillance expands.

20

Assessing the Impact of Timing and Coverage of United States COVID-19 Vaccination Campaigns: A Multi-Model Approach

Nande, A.; Larsen, S. L.; Turtle, J.; Davis, J. T.; Bandekar, S. R.; Lewis, B.; Chen, S.; Contamin, L.; Jung, S.-m.; Howerton, E.; Shea, K.; Bay, C.; Ben-Nun, M.; Bi, K.; Bouchnita, A.; Chen, J.; Chinazzi, M.; Fox, S. J.; Hill, A. L.; Hochheiser, H.; Lemaitre, J. C.; Loo, S. L.; Marathe, M.; Meyers, L. A.; Pearson, C. A. B.; Porebski, P.; Przykucki, E.; Smith, C. P.; Venkatramanan, S.; Vespignani, A.; Willard, T. C.; Yan, K.; Viboud, C.; Lessler, J.; Truelove, S.

2026-04-08 public and global health 10.64898/2026.04.07.26349269 medRxiv

Top 0.2%

6.3%

Show abstract

Background Six years after its emergence, SARS-CoV-2 continues to have a substantial burden. The impact of vaccination and the optimal timing of its rollout remain uncertain given existing population immunity and variability in outbreak timing between summer and winter. Methods The US Scenario Modeling Hub convened its 19th round of ensemble projections for COVID-19 hospitalizations and deaths in the United States, where eight teams projected trajectories in each US state and nationally from April 2025 to April 2026 under five scenarios regarding vaccine recommendations and timing. Recommendations had two eligibility scenarios (high-risk individuals only and all-eligible) and two timing scenarios (classic start: mid-August, earlier start: late June). These were crossed to create four scenarios and were compared against a counterfactual scenario with no vaccination. Findings Compared to no vaccination, our ensemble projections estimated 90,000 (95% PI 53,000-126,000) hospitalizations averted in the high-risk and classic timing scenario across the US. Expanding to all-eligible age-groups averted an additional 26,000 (95% PI 14,000-39,000) hospitalizations, which when coupled with the early vaccination timing, was projected to further reduce national hospitalizations by 15,000 (95% PI -3,000-33,000). The majority of teams projected both summer and winter waves. Implications We project COVID-19 will cause significant hospitalizations and deaths in the US in the 2025-26 season and estimate significant benefits from a broad all-eligible vaccination recommendation. The results also suggest an additional benefit is likely to be gained from an earlier vaccination campaign. Funding Centers for Disease Control and Prevention; National Institute of Health (US), National Science Foundation (US)